Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis
نویسندگان
چکیده
OBJECTIVE We describe experiments designed to determine the feasibility of distinguishing known from novel associations based on a clinical dataset comprised of International Classification of Disease, V.9 (ICD-9) codes from 1.6 million patients by comparing them to associations of ICD-9 codes derived from 20.5 million Medline citations processed using MetaMap. Associations appearing only in the clinical dataset, but not in Medline citations, are potentially novel. METHODS Pairwise associations of ICD-9 codes were independently identified in both the clinical and Medline datasets, which were then compared to quantify their degree of overlap. We also performed a manual review of a subset of the associations to validate how well MetaMap performed in identifying diagnoses mentioned in Medline citations that formed the basis of the Medline associations. RESULTS The overlap of associations based on ICD-9 codes in the clinical and Medline datasets was low: only 6.6% of the 3.1 million associations found in the clinical dataset were also present in the Medline dataset. Further, a manual review of a subset of the associations that appeared in both datasets revealed that co-occurring diagnoses from Medline citations do not always represent clinically meaningful associations. DISCUSSION Identifying novel associations derived from large clinical datasets remains challenging. Medline as a sole data source for existing knowledge may not be adequate to filter out widely known associations. CONCLUSIONS In this study, novel associations were not readily identified. Further improvements in accuracy and relevance for tools such as MetaMap are needed to realize their expected utility.
منابع مشابه
Applying a climatologically oriented GIS in comparison of TRMM estimated severe thunderstorm rainfalls with ground truth in Sydney metropolitan area
The main objective of the current research was comparison of severe thunderstorm rainfalls with TRMM data when flash flooding events observed in the Sydney Metropolitan Area (SMA) located in NSW, Australia. Severe Thunderstorm Rainfall Events have been first extracted from the severe storm archive of the Australian BOM, by induction of specific criteria. The corresponded derived dataset includ...
متن کاملA report on Allelic Variation in Helicobacter pylori dupA: A viewpoint
Helicobacter pylori (H. pylori) is the pivotal cause of chronic gastritis, peptic ulcer diseases (PUD) and gastric cancer. Morphologically, the bacterium is spiral, Gram-negative and microaerophilic which survives lifespan in the human stomach in case of weak antibiotic therapy. There is a major difference in the pattern of global prevalence of H. pylori infection based on different levels of u...
متن کاملA PCA/ICA based Fetal ECG Extraction from Mother Abdominal Recordings by Means of a Novel Data-driven Approach to Fetal ECG Quality Assessment
Background: Fetal electrocardiography is a developing field that provides valuable information on the fetal health during pregnancy. By early diagnosis and treatment of fetal heart problems, more survival chance is given to the infant.Objective: Here, we extract fetal ECG from maternal abdominal recordings and detect R-peaks in order to recognize fetal heart rate. On the next step, we find a be...
متن کاملA Novel Approach to Evaluate the Road Safety Index: A Case Study in the Roads of East Azerbaijan Province in Iran
Road safety index is an important indicator that has been recently introduced as a useful tool to measure the quality of life in many countries and cities. Road safety index is a complex index and it has at least three main components, including road user behavior, vehicle safety, and road infrastructure effects. Many researchers have selected studying road performance from road safety index pe...
متن کاملExploring automatic approaches to extracting pharmacogenomic information from the biomedical literature
BACKGROUND: One aspect of personalized medicine is better adaptation of therapeutic drugs to the specific situation of a given patient, part of which is determined by his or her unique genetic make-up. Pharmacogenomics attempts to assess the influence of genetic variation on drug response. The biomedical literature is the primary vehicle for reporting the association between gene variants and d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of the American Medical Informatics Association : JAMIA
دوره 21 5 شماره
صفحات -
تاریخ انتشار 2014